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PROJECTIONS  AND  PLANS  DEPARTMENT 


PREFACE 


This  Research  Guide  complements  a two-volume  document  entitled 
A General  Handbook  for  Long-Range  Environmental  Forecasting  pub- 
lished by  CACI,  Inc.  in  February  1973,  Volume  I of  the  Handbook 
discusses  important  techniques  that  are  applicable  to  long-range  environ- 
mental forecasting  and  presents  a bibliography  of  long-range  forecasting 
studies  while  Volume  II  describes  data  files  that  may  be  of  use  to  long- 
range  forecasters  In  the  national  security  community. 

The  Research  Guide  is  designed  tobe  used  in  conjunction  with  the 
Handbook.  The  Guide  itself  does  not  describe  forecasting  techniques 
but  demonstrates  how  the  techniques  discussed  in  the  Handbook  can 
be  applied  to  long-range  forecasting.  The  Guide  is  divided  into  four 
chapters.  Chapter  1 describes  the  building  of  a model;  Chapter  2 
discusses  the  data  collection  phase  of  the  forecasting  effort;  Chapter  3 
describes  parameter  estimation  or  the  manner  in  which  variables  are 
related  to  one  another;  and  Chapter  4 presents  a hypothetical  long- 
•range  forecast  that  uses  the  forecasting  procedures  described  in  the 
first  three  chapters  of  the  Guide . 

The  document  was  prepared  in  conjunction  with  the  work  that  the 
Projections  and  Plans  Department  of  CACI,  Inc.  has  done  for  the 
Defense  Advanced  Research  Projects  Agency  under  Contract  No. 
DAHC15-71-C-0201,  Modification  Nos . POOOlland  P00013. 
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CHAPTER  1:  THE  CONCEPTUAL  MODEL 


A "conceptual  model"  is  a representation  of  the  interrelationships 
among  the  components  of  a system.  It  may  be  based  on  hunches,  in- 
tuition, experience,  superstition,  "bock  learning"  or,  more  com- 
monly, all  of  these.  In  forecasting,  anyone  who  attempts  to  predict 
a future  event  or  explain  a historical  one  has  a conceptual  model  that 
permits  him  to  filter  out  the  unimportant  and  link  what  is  important  in 
comprehensible  ways.  Thus  any  forecast  requires  a conceptual  model, 
and  a systematic  forecast  requires  that  a conceptual  model  be  devel- 
oped systematically. 

The  development  of  such  a conceptual  model  has  two  distinct  phases-- 
isolating  the  important  factors  and  identifying  the  causal  relationships 
among  them.  Factors  are  those  measurable  aspects  of  a situation  that 
either  change  (variables)  or  remain  constant  (parameters),  and  are  the 
focus  of  the  investigation.  Relationships  link  changes  in  some  factors 
with  changes  in  others.  The  modus  operandi  of  any  forecasting  metho- 
dology is  to  establish  these  relationships. 

The  most  systematic  forms  of  forecasting  use  statistical  methods  to 
determine  the  most  appropriate  "functional  form"  of  a relationship 
among  factors  by  analyzing  how  well  such  forms  explain  or  predict 
past  history  as  represented  by  data.  Once  the  best  relationship  is 
determined,  it  is  applied  to  current  data  on  the  assumption  that  the 
best  relationship  for  explaining  the  past  will  be  most  appropriate  for 
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predicting  the  iuture.  The  first  step  in  the  process,  however,  is  to 
identify  the  factors  in  the  model. 

Factors 

Factors  consist  of  parameters  and  variables.  Variables  are  factors 
that  change  whereas  parameters  are  factors  that  remain  constant.  An 
example  of  a parameter  is  the  number  of  permanent  members  in  the 
United  Nations  Security  Council.  A variable,  on  the  other  hand,  is  a 
quantity  that  changes  such  as  the  number  of  accusations  the  Indian 
Government  directs  at  the  Pakistani  Government  each  month. 

The  impact  of  time  on  the  various  factors  of  a system  is  crucial  when 
building  a conceptual  forecasting  model.  In  the  very  long  run  most  fac- 
tors tend  to  be  variables,  while  in  the  very  short  run  most  factors  tend 
to  be  parameters.  An  example  may  help  illustrate  the  importance  of 
the  time  frame  in  differentiating  parameters  from  variables.  Consider 
a forecast  of  Yugoslavia's  alignment  with  the  United  States  over  the 
next  6 months.  In  this  short-run  forecast  many  factors  are  parameters. 
We  can  assume  that  Yugoslavia  will  maintain  its  Marxist  form  of  govern- 
'rnent  and  that  Tito  will  continue  to  be  the  head  of  state  and  hence  influ- 
ence Yugoslavia's  alignment  posture.  On  the  basis  of  tho.se  parameters, 
a forecast  of  Yugoslavia's  alignment  can  be  generated.  If  the  time 
frame  of  the 'forecast  is  extended  to  10  years,  however,  many  of  the 
factors  will  become  variables.  It  is  questionable,  for  example,  whether 
Tito  will  still  be  in  public  service  in  10  years,  let  alone  chief  of  state. 
Furthermore,  over  the  next  10  years,  Yugoslavia  might  adopt  a more 
or  a less  authoritarian  form  of  Marxist  rule. 

Variables.  Within  the  context  of  a particular  model,  variables  are 
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either  exogenous  or  endogenous.  The  values  of  exogenous  variables 
are  determined  outside  the  system  considered,  that  is,  they  are 
taken  as  given.  Thus  the  American  GNP  may  be  exogenous  to  the 
level  of  conflict  between  the  United  States  and  Canada.  This  var- 
iable may  partially  determine  changes  in  the  level  of  conflict;  but 
such  conflict  is  not  likely  to  determine  changes  in  the  American 
GNP. 

Exogenous  variables  are  of  two  types — nonmanipulables  and  manipu- 
lables.  Nonmanipulable  exogenous  variables  are  variables  that  are 
not  affected  by  man-made  decisions  such  as  time,  earthquake  activity, 
and  to  a lesser  degree  weather  and  culture.  Manipulable  exogenous 
variables,  on  the  other  hand,  can  be  consciously  changed  such  as 
the  level  of  foreign  aid  the  United  States  gives  to  a less  developed 
country.  Thus,  if  a developing  country  becomes  more  aligned  with 
the  United  States  when  given  more  aid,  the  United  States  can  partially 
determine  the  alignment  of  that  country  by  furnishing  it  with  a certain 
level  of  aid.  Foreign  aid,  in  this  instance,  is  a manipulable  exoge- 
nous variable. 

The  values  of  endogenous  variables  are  determined  by  other  variables 
that  can  either  be  endogenous  or  exogenous.  An  example  of  an  endo- 
genous variable  is  the  level  of  American- Canadian  conflict  in  the 
above  example. 

Parameters.  Parameters  can  be  "conceptual”  or,  "structural.  " A 
conceptual  parameter  is  a factor  that  remains  unchanged  throughout 
the  period  examined.  Consider,  for  example,  a forecast  of  future 
Arab-Israeli  areas  of  conflict.  A conceptual  parameter  to  this  fore- 
cast is  the  continued  presence  of  tensions  between  the  parties.  All 
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the  data  available  on  the  relations  between  the  Arab  states  and  Israel 
refer  to  periods  of  conflict;  thus  there  are  no  data  for  periods  in 
which  the  relations  were  not  hostile.  Structural  parameters,  on  the 
other  hand,  are  parameters  that  are  calculated  by  some  empirical 
technique  and  are  used  directly  in  the  forecast  to  tie  variables  toge- 
ther. 

The  relationship  between  parameters  and  variables  is  shown  in 
Figure  1.  Figure  1 indicates  that  various  types  of  factors  can  af- 
fect values  of  endogenous  variables.  Endogenous  variables  are  the 


Figure  1.  General  Description  of  a Forecasting  Model 


Unspecified  effects  are  usually  called  "errors"  or  "residuals.  " 
These  effects  account  for  random  noise,  inaccuracies  that  arise 
from  "left-out"  variables,  and  inaccuracies  that  occur  in  linear- 
izing a nonlinear  relationship. 
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variables  whose  values  are  determined  by  the  model.  The  figure 
demonstrates  that  the  values  of  endogenous  variables  depend  on  the 
values  of  parameters  and  other  variables.  Thus  the  internal  stability 
of  a country  may  depend  on  its  GNP  per  capita  (a  nonmanipulable  exo- 
genous variable),  the  level  of  government  expenditures  (a  manipulable 
exogenous  variable),  the  number  of  different  racial  groups  in  the 
country  (a  conceptual  parameter),  and  the  derived  number  of  protests 
per  dollar  of  government  expenditure  (a  structural  parameter). 

By  using  Figure  1,  a preliminary  forecasting  model  can  be  generated. 
For  this  task  to  be  successful,  the  analyst  must  first  identify  the  ob- 
jectives of  his  forecast.  This  will  usually  involve  detern  ining  the 
appropriate  time  frame  for  the  forecast.  If  the  analyst  is  an  expert 
in  the  area,  he  can  pick  those  factors  that  will  bring  about  his  objec- 
tives, separate- the  factors  into  variables  and  parameters,  and  dis- 
tinguish between  endogenous  and  exogenous  variables.  In  this  manner, 
he  can  build  a conceptual  model  which  is  then  used  to  generate  the 
forecast.  Thus  an  international  relations  specialist  in  the  State  De- 
partment working  at  the  Indian  Desk  who  is  required  to  forecast  future 
Indian  foreign  policy  should  possess  the  expertise  to  choose  relevant 
factors  for  the  forecast.  Often,  however,  the  national  security  analyst 
may  lack  the  required  training  for  this  task.  In  these  cases,  he  will 
require  help  in  selecting  the  appropriate  factors  that  are  necessary 
to  build  the  forecasting  model. 


A PROCEDURE  TO  IDENTIFY  RELEVANT  FACTORS  IN  A FORE- 


CASTING MODEL 


There  are  various  ways  whereby  expert  judgment  can  be  employed  to 
identify  the  important  factors  of  a forecasting  model.  Of  particular 
importance  are  Consensus  Methods. 


Consensus  Methods.  There  are  two  techniques  within  the  general 
category  of  Consensus  Methods  that  are  particularly  applicable  in 
identifying  the  relevant  factors  of  a forecasting  model.  They  are  the 
Committee  Approach  and  the  Delphi  Technique. 

Committee  Approach.  This  procedure  involves  relying 
on  the  opinion  of  experts.  The  output  of  this  exercise 
is  a list  of  factors  that  bring  about  the  objectives  of 
the  forecaster.  The  Committee  Approach  is  essentially 
a nonstructured  way  to  arrive  at  the  list. 

Delphi  Technique.  This  method,  somewhat  more  sys- 
tematic than  the  above,  involves  four  steps  to  produce 
a list  of  required  factors. 

• Have  a panel  of  experts  list  factors  that  bring 
about  objectives. 

• Compile  a master  list  with  each  factor  accom- 
panied by  the  number  of  times  it  is  mentioned. 

• Submit  the  new  list  to  the  experts  and  request 
revisions. 

• Repeat  the  first  two  steps  until  the  list  no 
longer  changes. 


When  the  list  of  factors  is  obtained,  the  analyst  must  then  determine 
which  factors  are  variables  and  which  are  parameters  in  the  time 
frame  to  be  examined.  To  accomplish  this,  it  may  again  be  necessary 
for  the  analyst  to  employ  some  form  of  Consensus  Methods,  particu- 
larly if  he  is  not  an  expert  in  the  area  of  the  forecast. 


For  an  explanation  of  Consensus  Methods,  see  A General  Handbook 
for  Long-Range  Environmental  Forecasting.  Vol.  I (Arlington,  VA, 
February  1973),  pp.  18-24. 
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Once  the  factors  relevant  to  the  analyst's  forecasting  objective  have 

been  chosen,  the  forecaster  must  determine  the  assumed  causal  rela- 
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tionships  among  them.  It  is  important  to  note  that  the  relationships 

considered  are  those  presumed  to  exist  on  the  basis  of  the  expertise 

of  either  the  analyst  or  his  consultants.  The  estimation  phase  of  the 

research  may  show  that  certain  causal  relationships  initially  deemed 
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important  by  the  experts  are  in  fact  nonexistent.  For  example,  inter- 
national relations  experts  may  believe  that  alignment  between  two 
nations  is  mainly  affected  by  foreign  aid  received  by  one  nation  from 
the  other.  If  this  is  not  borne  out  by  empirical  examination,  that  hy- 
pothesis is  rejected. 


Relationships  between  variables  fall  into  two  general  classes.  First 
are  those  that  relate  variables  in  a rigorous  and  consistent  manner, 
such  as  the  trajectory  of  a particle  in  physics.  These  relations  are 
termed  "laws."  Secondare  relations  that  are  not  consistently  stable. 
In  socio-political,  military,  md  economic  long-range  forecasting, 
laws  are  extremely  rare  if  not  totally  absent.  Thus  empirical  estima- 
tion is  required  to  discover  the  nature  of  relations  in  particular  socio- 
economic and  political  systems.  To  accomplish  this  task  systemati- 
cally, the  estimation  must  be  structured.  A modified  use  of  Cross- 
Impact  Matrices  can  be  employed  to  discover  whether  the  relations 

Causal  relationships  are  characterized  by  cause  and  effect.  Thus 
it  may  be  that  the  alignment  of  Ecuador  with  the  United  States 
depends  on  the  level  of  U.  S.  aid  to  that  country--the  larger  the  aid 
(the  cause),  the  more  the  alignment  (the  effect). 

Estimation  involves  calculating  the  values  of  structural  parameters 
from  sample  data.  This  procedure  is  discussed  in  more  detail 
in  Chapter  3. 
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between  variables  are  positive,  negative,  or  absent,  and  to  indicate 
which  variables  are  endogenous  and  which  are  exogenous.4 

The  use  of  Cross -Impact  Matrices  in  this  context  is  illustrated  in 
Table  1 by  an  example  that  contains  three  variables- -A,  3,  and  C. 

The  forecaster  is  initially  interested  in  knowing  which  variables  bring 
about  changes  in  other  variables,  that  is,  which  variables  are  endo- 
genous and  which  are  exogenous.  This  causality  can  be  determined 
by  constructing  and  examining  the  following  matrix. 


TABLE  1 

A PROCEDURE  TO  IDENTIFY  CAUSALITY  BETWEEN  VARIABLES 


Cause 

Variables 

Effect  Variables 

- - 

A 

B 

c 

A 

0 

X 

0 

B 

0 

0 

X 

C 

0 

X 

0 

If  the  forecaster  or  expert  believes  that  a variable  may  cause  a change 
in  the  value  of  another  variable,  the  relationship  can  be  indicated  by 
a mark  placed  in  the  appropriate  box.  In  Table  1,  variable  A causes 
changes  in  B,  B causes  changes  in  C,  and  C causes  changes  in  B.  If 
A represents  the  similarity  in  the  ethnic  background  of  the  population  of 

4 

For  a discussion  of  Cross-Impact  Matrices,  see  the  Handbook, 

pp.  33-39. 
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two  countries,  B represents  their  degree  of  alignment,  and  C represents 
the  number  of  visits  by  their  heads  of  state,  then  ethnic  similarity  affect; 
alignment,  alignment  affects  the  number  of  visits,  and  the  number  of 
visits  affects  alignment. 

A zero  in  a cell  of  the  matrix  indicates  that  there  is  no  relationship 
from  one  variable  to  the  other.  If  a column  is  composed  only  of  zeros, 
then  the  corresponding  variable  is  not  affected  by  any  other  variables 
in  the  system  and  is,  by  definition,  exogenous.  If  a column  has  at 
least  one  nonzero  entry,  then  the  variable  is  endogenous  since  its 
value  is  determined  by  at  least  one  other  variable.  In  the  aforemen- 
tioned example,  variable  A is  exogenous  since  the  first  column  is  com- 
posed only  of  zeros.  B and  C,  on  the  other  hand,  are  endogenous 
variables  since  B is  affected  by  A and  by  C,  and  C is  affected'  by  B.  5 

For  forecasting  purposes,  awareness  of  the  presence  of  causality 
between  variables  is  necessary  but  not  sufficient  since  the  relationship 
between  two  or  more  variables  can  be  positive,  negative,  or  a com- 
binat'un  of  both.  This  can  be  indicated  by  replacing  the  "X"  mark 
with  a "+"  or  sign.  In  the  previous  example  one  would  expect  all 

5 

A close  examination  of  Table  1 reveals  that  only  the  direct  effects  of 
a variable  on  another  variable  are  measured.  There  are,  never- 
theless, indirect  effects  that  are  not  mentioned  in  this  case  to  keep 
the  exposition  simple.  Thus  variable  A affects  B,  and  B affects  C; 
therefore  A indirectly  affects  C.  This,  however,  is  not  shown  in  the 
matrix.  See  John  G.  Kemeny,  et_  al.  , Introduction  to  Finite  Mathe- 
matics (Englewood  Cliffs,  N.J.:  Prentice-Hall,  1957),  pp.  307-320 
for  further  examples  of  the  use  of  such  matrices. 
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A positive  relationship  between  two  variables  implies  that  if  the 
causal  variable  increases,  then  the  effect  variable  also  increases; 
the  converse  also  holds.  A negative  relationship  means  that  if  the 
cause  variable  increases,  then  the  effect  variable  decreases. 
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the  signs  to  be  positive  since  increases  in  ethnic  similarity  (A)  tend  to 
increase  alignment  (B),  which  in  turn  increases  the  number  of  visits 
(C),  which  further  increases  alignment.  If,  however,  variable  C re- 
presents the  number  of  territorial  disputes,  the  cell  signs  might  be  as 
follows: 


TABLE  2 

A PROCEDURE  TO  IDENTIFY  THE 
SIGNS  OF  THE  RELATIONSHIPS  BETWEEN  VARIABLES 


Cause 

Variables 

Effect  Variables 

A 

B 

C 

A 

0 

+ 

0 

B 

0 

0 

- 

C 

0 

- ■ 

0 

Sometimes  a cell  has  both  a positive  and  negative  sign  in  it,  suggesting 
that  one  variable  influences  another  variable  in  both  a positive  and  neg- 
ative fashion.  This  case  is  discussed  in  detail  in  the  forthcoming  sec- 
tion which  examines  some  basic  types  of  relationships. 

The  aforementioned  steps  of  the  forecasting  procedure  may  at  this 
point  be  illustrated  by  an  example.  Consider  the  following  problem 
which  involves  forecasting  the  national  power  base  ^ of  Ghana  in  the 
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We  define  national  power  base  as  the  material  and  human  resources 
available  to  a nation  to  influence  its  political  environment. 


year  1985.  As  described  in  the  previous  sections  of  the  Guide,  the 
forecast  should  proceed  with  the  following  steps: 

1.  Identify  Factors 

a.  Separate  factors  into  variables  and  parameters. 

b.  Separate  variables  into  endogenous  and  exogenous 
variables. 

2.  Identify  the  signs  (positive,  negative,  or  zero)  of  the 
relationships  among  the  factors. 

Next,  a Delphi  panel  composed  of  international  relations  experts  fa- 
miliar with  the  latest  literature  on  all  aspects  of  national  power  base 
is  formed.  Members  of  this  panel  should  be  experts  on  Ghana--its 
government,  economy,  foreign  policy,  military.  The  experts  then 
carry  out  a Delphi  exercise.  The  output  of  the  exercise  might  be  a 
list  of  factors  such  as  that  shown  below: 

• Military  Factors 

--Number  of  men  in  armed  forces 
--Defense  expenditures 

• Economic  Factors 
--Gross  national  product 

• Other  Factors 

' --Land  mass  of  Ghana 

--Population 

--Percentage  of  population  under  20  years  of  age 


The  list  of  factors  should  include  both  variables  (e.  g.  , size  of  armed 
forces)  and  parameters  (e.  g.  , land  mass  of  Ghana). 

The  list  of  variables  is  further  broken  down  into  endogenous  and  exo- 
genous variables.  For  this  task,  a matrix  such  as  the  one  in  Table  3 
is  constructed.  In  this  matrix,  the  following  factors  are  variables: 
number  of  men  in  the  armed  forces  (A),  defense  expenditures  (B), 

GNP  (C),  and  population  (D).  This  represents  a total  of  four  variables 
which  can  be  categorized  in  a 4-by-4  matrix  as  follows: 


TABLE  3 

A HYPOTHETICAL  MATRIX  TO 
FORECAST  THE  POWER  BASE  OF  GHANA  IN  1985 


Cause 

Variables 

Effect  Variables 

A 

B 

C 

D 

A 

0 

+ 

X 

0 

0 

B 

+ 

X 

0 

0 

0 

C 

0 

+ 

X 

0 

0 

D 

+ 

X 

0 

0 

0 

In  each  cell  of  the  matrix  the  forecaster  inserts  a zero  or  an  X,  de- 
pending on  whether  he  feels  a particular  variable  influences  another 


variable.  For  example,  an  increase  in  gross  national  product  (C)  may- 
lead  to  an  increase  in  defense  expenditures  (B);  but  an  increase  in 
population  (D)  will  not  influence  defense  expenditures.  Of  the  four 
variables  considered  in  the  matrix  only  two,  population  and  GNP, 
are  exogenous  since  only  zeros  can  be  found  in  the  columns  under 
the  two  variables. 


The  next  step  in  the  forecast  process  involves  discovering  the  signs 
of  the  relationships  between  the  variables.  The  assistance  of  experts 
may  be  necessary  at  this  point  to  help  the  forecaster.  On  the  basis 
of  the  theoretical  literature  of  national  power  base,  all  the  relation- 
ships are  believed  to' be  positive. 


Types  of  Relationships 


There  are  three  ways  to  express  relationships  between  variables-- 
mathematically,  diagrammatically,  and  verbally.  Only  the  first 
is  adequate  for  systematically  forecasting  any  but  the  most  simple 
system  with  any  degree  of  reliability.  Nevertheless,  the  other  two 
methods  are  critical  adjuncts  to  the  process  of  building  models. 

For  expository  purposes,  th^  second  method  was  selected  to  des- 
cribe certain  common  types  of  relationships.  First,  the  bivariate 
ca  e (one  with  two  variables)  was  chosen.  Second,  the  multivariate 
case  (one  with  more  than  two  variables),  in  which  an  endogenous 
variable  is  related  to  more  than  one  exogenous  variable,  is  discussed. 


The  bivariate  case  can  be  illustrated  diagrammatically  since  it  re- 
quires only  two  dimensions  (axes).  Consider  the  following  figures 
which  describe  a positive  relationship  between  variables  A and  B. 
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Figure  2 


Figure  3 


Figure  4 


Examples  of  Positive  Relationships 


Furthermore,  assume  that  causality  exists  between  A and  B in  a man- 

O 

ner  that  the  occurrence  of  B causes  the  occurrence  of  A.  Alterna- 
tively, A is  the  endogenous  variable  while  B is  the  exogenous  variable. 
Considering  the  function  Oa  in  Figure  2,  an  increase  of  10  units  in  the 
value  of  B brings  about  an  identical  increase  in  the  value  of  A (if  both 
are  measured  in  the  same  units).  This  is  a linear  relationship.  It  is 
possible,  however,  that  another  functional  relationship  such  as  Ob 
relates  A to  B.  In  this  instance  an  increase  in  B brings  about  a smaller 
increase  in  A.  On  the  other  hand,  if  the  function  is  located  to  the  left 
of  0a  (0c  for  example),  an  increase  in  B brings  about  a larger  in- 


crease in  A. 


Unlike  Figure  2,  the  nature  of  the  relationship  between  A and  B in 
Figure  3 is  nonlinear.  This  absence  of  linearity  implies  that  the  im- 
pact of  B on  A is  not  constant  throughout  the  curve  but  changes 


In  this  discussion  we  are  only  considering  static  analysis  (A  causes 
B instantaneously  with  no  time  lag).  In  dynamic  analysis,  however, 
a change  in  B may  cause  a change  in  A after  a time  lag. 


according  to  the  magnitude  of  B.  Incremental  increases  in  B cause 
more  than  proportional  incremental  increases  in  A.  This  type  of  re- 
lationship is  called  an  exponential  relationship  and  is  sometimes  used 
by  demographers  to  illustrate  population  growth.  Figure  4 describes 
a logarithmic  relationship.  In  this  case,  successive  increases  in  B 
bring  about  smaller  and  smaller  increases  in  A. 


The  functional  relationships  portrayed  in  Figures  2,  3,  and 
positive  relationships  between  variables  A and  B.  Negative 
ships  between  variables  may  exist  as  shown  in  Figures  5-7. 


4 described 
relation- 


Figure  5 


Figure  6 Figure  7 

Examples  of  Negative  Relationships 


Figure  5 presents  an  inverse  linear  relationship  between  A and  B 

which  is  similar  to  the  relationship  portrayed  in  Figure  2;  but  it  is 

negative  since  an  increase  in  B brings  about  an  identical  decrease  in 

A.  In  Figures  6 and  7 the  functional  forms  are  such  that  an  increase 

1,1  B bHngS  ab°Ut  decreases  in  A at  a decreasing  rate  in  the  first  in- 
stance  and  at  an  increasing  rate  in  the  second. 
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In  both  sets  ox  figures,  the  relationships  were  portrayed  in  a manner 
such  that  the  endogenous  variable  either  increased  or  decreased  for 
all  values  of  the  exogenous  variable.  It  is  possible,  however,  for  a 
relationship  to  increase  the  value  of  the  endogenous  variable  for  some 
values  of  the  exogenous  variable,  and  to  decrease  the  value  for  other 
values  of  the  exogenous  variable.  This  is  what  is  meant  by  both  posi- 
tive and  negative  signs  in  the  cells  of  the  matrix  in  Table  2. 

Examples  of  these  types  of  relationships  are  given  in  Figures  8 through 


Figure  8 


Figure  9 


Figure  10 


Examples  of  Higher  Order  Relationships 


Figures  8 and  9 portray  "quadratic"  relationships,  while  Figure  10 
portrays  a "cubic"  relationship.  In  Figure  8,  any  increase  in  B to  the 
left  of  k^  brings  about  a decrease  in  A;  any  increase  in  B to  the  right 
of  kj,  however,  brings 'about  an  increase  in  A.  The  opposite  relation- 
ship holds  in  Figure  9.  Finally,  in  Figure  10,  both  characteristics 
of  Figures  8 and  9 hold  depending  on  which  values  of  B are  chosen. 


Every  relationship  described  thus  far  considered  only  two  variables. 

In  the  real  world,  however,  relationships  are  often  multivariate,  that 
is,  values  of  the  endogenous  variable  are  not  determined  by  a single 
exogenous  variable  but  by  two,  three,  or  more  exogenous  variables. 
Thus  the  alignment  of  Sri  Lanka  (Ceylon)  with  respect  to  the  United 
States  may  depend  on  1)  the  number  of  visits  by  heads  of  state  between 
the  two  countries,  2)  the  level  of  United  States  aid  to  Sri  Lanka,  and 
3)  trade  patterns  between  the  two  cour  tries.  To  describe  these  vari- 
ables dia grammatically  is  difficult  since  only  two-dimensional  relation- 
ships can  be  realistically  portrayed  graphically.  Relationships  among 
three  variables  can  be  expressed  in  the  form  of  "three-dimensional 
solids.  " Relationships  among  more  than  three  variables  cannot  be 
conceived  (at  least  by  most  people)  since  we  are  constrained  by  the 
three-dimensional  world  we  live  in.  An  attempt  is  made  in  Figure  11 
to  draw  a three-dimensional  diagram  expressing  the  relationships 
among  three  variables  in  two-dimensional  space.  Variable  A is  the 
endogenous  variable,  while  B and  C are  the  exogenous  variables.  The 
function  relating  the  three  variables  is  not  a line  (such  as  in  the  prev- 
ious figures)  but  a two-dimensional  plane  or  surface  (w  x y z).  A point 
on  the  plane,  such  as  M,  will  have  three  coordinates- -a,  b,  c--one 
coordinate  for  each  axis. 

In  this  chapter  important  basic  types  of  relationships  were  presented. 
In  mathematics  there  are  an  infinite  number  of  such  relationships  that 
may  relate  variables  to  each  other.  In  the  social  sciences  we  concen- 
trate on  relatively  simple  forms.  Yet  to  establish  these  forms  em- 
pirically, data  on  the  variables  are  necessary.  This  is  the  topic  of  the 
next  chapter  of  this  Research  Guide. 


u 
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There  ar.'  thn  distinct  but  related  problems  in  the  data  collection 
phase  of  a forecasting  effort: 

• Selecting  indicators  of  the  variables 

• Operationalizing  the  variables 

• Collecting  the  data 

Selecting  the  indicators  of  a variable  involves  discovering  those  di- 
rectly measurable  variables  that  best  "tap11  the  meaning  of  a concep- 
tual variable.  Operationalizing  a variable  invokes  determining  pre- 
cisely how  the  indicators  are  to  be  measured  and  combined.  Collect- 
ing the  data  is  simply  performing  the  actual  measurements. 

THE  CHOICE  OF  INDICATORS 

In  the  previous  chapter,  the  determination  of  the  forecasting  model 
was  made  on  the  basis  of  conceptual  variables.  A conceptual  variable 
is  one  that  has  direct  meaning  to  the  forecaster  in  a causal  sense. 

The  wealth  or  power  of  a nation  and  the  level  of  conflict  or  alignment 
between  or  among  nations  are  such  variables.  However,  to  perform 
a forecast  with  precise,  unambiguous  results,  it  is  absolutely  neces- 
sary to  measure  such  conceptual  variables  in  such  a way  as  to  provide 
an  unimpeachable  basis  for  the  use  of  the  forecast. 

To  do  this  one  must  often  use  indicators  of  conceptual  variables.  An 
indicator  of  a variable  is  usually  some  othpr  variable  that  is  used  to 
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measure  the  original  conceptual  variahle.  For  example,  the  GNP  of 
a nation  can  be  used  as  an  indicator  of  that  nation's  economic  power 
base.  The  voting  patterns  of  nations  in  the  United  Nations  may  be  an 
indicator  of  their  alignment. 

There  is  no  systematic  method  to  determine  a priori  the  best  indica- 
tors of  a conceptual  variable.  Experts  in  the  area  are  the  most  qual- 
ified persons  to  select  these  indicators  since  they  are  familiar  with 
the  latest  theoretical  and  empirical  research  in  the  area.  It  is  im- 
portant to  choose  as  many  indicators  of  a variable  as  possible,  and 
the  expert  may  be  in  the  best  position  to  do  this.  Some  variables  may 
admit  to  relatively  few  possible  indicators,  such  as  economic  power 
base,  while  others  may  have  a large  number  of  indicators,  such  as 
conflict  between  nations.  In  either  case,  however,  it  is  important  to 
be  as  thorough  as  possible  in  identifying  indicators. 


Obtaining  the  greatest  possible  number  of  indicators  is  important 
because  the  forecaster  may  find  that  some  data  are  not  available  for 
particular  indicators,  and/or  the  indicators  may  be  statistically  re- 
lated. In  the  first  situation,  the  effect  on  the  quality  of  the  forecast 
is  greatly  reduced  when  inappropriate  indicators  must  be  used.  In 
the  second  situation,  there  are  somewhat  more  complex  consequences 
If  the  indicators  of  a conceptual  variable  are  highly  correlated,  the 
variable  may  be  homogeneous.  If  this  is  the  case,  then  only  one  of 
the  indicators  is  needed  to  measure  the  variable.  If  the  indicators 
are  not  highly  correlated  among  themselves,  the  variable  is  hetero- 
geneous, and  it  will  be  necessary  to  combine  values  of  these  indica- 
tors to  measure  the  variable.  Whether  a conceptual  variable  is  homo 
geneous  or  heterogeneous  will  not  usually  be  known  when  indicators 
are  selected;  therefore  it  is  important  to  select  as  many  indicators 


as  possible. 


OPERATIONALIZING  THE  VARIABLES  (INDICATORS). 


Operationalizing  a variable  means  determining  precisely  how  the  in- 
dividual indicators  are  to  be  measured  and  the  most  likely  procedures 
for  combining  them  to  form  variables.  This  phase  clearly  overlaps 
with  the  da+a  collection  phase  in  that  certain  operational  distinctions 
must  be  made  after  the  data  are  available;  but  a number  of  important 
decisions  must  be  made  prior  to  data  collection. 

The  most  fundamental  decision  that  must  be  made  is  precisely  how  to 
mtasuve  che  indicator.  This  decision  will  affect  the  remainder  of  the 
forecast  process,  since  it  prescribes  the  manner  in  which  the  data 
will  be  collected,  and  may  preclude  later  revision  of  the  data.  For 
example,  if  military  power  base  is  the  conceptual  variable,  and  the 
military  manpower  of  a nation  is  one  indicator,  it  is  crucial  to  deter- 
mine precisely  what  is  meant  by  military  manpower.  One  would 
estimate  the  number  of  individuals  in  the  army,  navy,  air  force,  and 
rocket  force.  But  should  the  number  of  individuals  in  the  coast  guard, 
the  ready  reserves,  and  the  militia  be  counted?  Should  one  be  pre- 
pared to  accept  the  official  government  figures  for  these  indicators, 
or  should  one  standardize  on  some  "impartial"  source  such  as  the 
International  Institute  for  Strategic  Studies?  And  does  one  want  to 
attempt  to  "weight",  such  figures  by  estimated  level  of  training,  degree 
of  readiness,  and  quality  of  leadership,  or  are  these  components  of 
some  other  indicator?  These  questions  must  be  asked  prior  to  data 
collection. 

After  the  data  are  collected,  a single  measure  of  the  conceptual  vari- 
able is  created.  The  question  of  homogeneous  or  heterogeneous  vari- 
ables can  be  tested  at  this  point.  To  contihue  the  above  example,  if 
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one  had  arrived  at  an  acceptable  definition  of  military  manpower  and 
were  able  to  collect  the  required  data.,  one  would  need  to  know  how  to 
combine  the  data  with  other  indicators  of  military  power  base.  The 
values  of  the  indicators  could  be  correlated  with  each  other,  and  the 
resulting  pattern  analyzed. 

Every  pair  of  indicators  has  a correlation  coefficient  that  indicates 
how  strongly  the  two  indicators  vary  together.  If  all  of  the  correla- 
tion coefficients  are  sufficiently  high  (a  figure  of  ± 0.80  may  be  a 
useful  cutoff)  then  the  variable  may  be  homogeneous,  and  any  single 
indicator  can  be  selected.  If  most  of  the  indicators  are  highly  corre- 
lated, it  may  be  possible  to  ignore  the  ones  that  are  not  highly  corre- 
lated and  still  select  one  indicator  from  the  homogeneous  group.  If 
this  is  not  possible  (because  an  uncorrelated  indicator  is  too  important 
to  omit)  or  if  there  is  no  pattern  of  homogeneity,  then  the  variable 
must  be  heterogeneous,  and  the  operationalization  process  continues 
by  constructing  an  index  of  the  variable  that  considers  all  of  its  indi- 
cators properly  weighted.  There  are  three  ways  to  construct  such 
an  index.  The  first  is  judgmental  and  permits  the  expert  to  say  that 
one  indicator  is  two  or  three  (or  whatever)  times  as  important  as 
another  indicator,  and  to  weight  all  indicators  by  such  a procedure. 
The  second  is  quantitative  and  uses  Regression  Analysis  by  which 
the  forecaster  creates  values  of  the  composite  variable  from  expert 
judgment  for  a sample  of  cases,  and  then  uses  least  squares  to  deter- 
mine the  weights  for  the  indicators  that  best  reproduce  those  values. 

2 

The  third  is  Factor  Analysis  which  is  a powerful  tool  to  derive 
* See  the  Handbook,  pp.  46-52. 

2 See  R.  J.  Rumme  1 "Understanding  Factor  Analysis,"  Journal  of 
Conflict  Resolution  (December  1967),  pp.  444-480. 
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weights  determined  entirely  by  the  degree  of  correlation  among 
the  indicators  themselves.  In  all  three  cases,  the  weights  are  then 
used  to  construct  the  values  of  the  variable  for  use  in  the  forecast- 
estimation  phase. 

DATA  COLLECTION  ON  THE  VARIABLES 


Two  potential  problems  that  must  be  solved  confront  the  analyst  at 
this  point  in  the  research.  They  concern  the  reliability  ai  d availabil- 
ity of  the  data  ’"or  any  given  indicator.  Data  quality  is  crucial  for 
precise  and  reliable  forecasts.  There  are  three  possible  sources  for 
most  socio-economic  and  political  data.  The  first  source  is  the  fore- 
caster himself  who  creates  the  data  and  develops  a coding  procedure. 
Generally,  data  obtained  by  coding  are  subject  to  errors  that  arise 
when  specifying  certain  coding  rules  (operationalization)  and  when 
applying  those  rules.  Errors  also  occur  when  coding  rules  are  changed 
in  the  middle  of  the  data- gathering  process,  thus  preventing  data 
comparisons.  This  is  especially  acute  in  data  collected  at  different 
times  (time-series  data)  since  the  coding  rules  themselves  may  change. 
Errors  in  specifying  the  coding  rules  can  be  corrected  by  expertise. 

Errors  in  applying  the  coding  rules  can  be  controlled  by  applying 

3 

constant  checks  on  the  coders'  performance. 

A second  and  more  important  data  source  (to  the  international  affairs 
forecaster)  is  governments  or  other  official  data  collection  agencies. 

In  addition  to  previously  mentioned  errors,  government  data  may  con- 
tain biases  that  are  either  conscious  or  unconscious.  Governments 
sometimes  deliberately  present  biased  data  for  political  reasons. 

3 

For  a detailed  explanation  of  various  coding  procedures,  see  Kenneth 
Janda,  Data  Proces sing  (Evanston,  111.  , Northwestern  University 
Press,  1969),  Chapter  2. 
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More  importantly,  government  data  created  as  it  is  for  other  pur- 
poses is  often  nearly,  but  not  exactly,  what  the  analyst  wants.  The 
analyst  must  adjust  biased  government  data  whenever  he  detects  a 
bias. 

Another  source  of  data  is  provided  by  such  international  organizations 
as  the  United  Nations,  OECD,  and  the  World  Bank.  These  agencies 
do  not  usually  collect  the  data  but  report  data  collected  by  individual 
governments.  These  agencies  do,  nevertheless,  attempt  to  categorize 
the  data  and  make  them  comparable  across  countries.  Moreover 
international  organizations  may  correct  the  data  when  conscious  biases 
are  present. 

Data  availability  is  another  important  problem  analysts  must  face  in 
the  data  collection  phase.  A set  of  data  is  very  seldom  complete. 

This  is  especially  true  of  political  and  social  data  as  well  as  economic 
data.  Generally  speaking,  quantitative  techniques  of  forecasting  re- 
quire complete  data  sets,  while  qualitative  techniques  can  be  applied 
to  incomplete  sets  of  data.  If  the  analyst  wants  to  employ  quantitative 
techniques,  he  must  fill  in  the  gaps  in  the  data.  There  are  various 

ways  to  create  "missing"  data.  Some  of  the  important  methods  are 

4 

discussed  as  follows: 

Fragmentary  evidence  and  "expert"  judgment:  In- 

formed opinion  and  verbal  descriptions  can  be  used 
to  obtain  estimates  of  missing  data.  Though  obviously 
subject  to  bias,  this  method  may  be  useful  where 
there  is  considerable  prior  knowledge  about  a parti- 
cular case. 


Use  of  mean,  mode,  or  median.  One  of  the  most  common 
methods  for  handling  missing  data  is  to  substitute  their 
mean  value.  Means  are  computed  by  using  all  cases  having 
data  on  the  variable.  Depending  upon  specific  characteris- 
tics of  the  variable  in  question,  the  mode  or  median  may 
be  preferred  over  the  mean.  This  method  relies  solely 
upon  available  data  and  thus  avoids  judgmental  bias.  How- 
ever, averages  tend  to  underestimate  the  total  amount  of 
variance  from  the  mean,  that  is,  the  number  of  extreme 
values  is  reduced.  Estimates  using  means,  medians,  or 
modes  are  most  appropriate  where  available  data  cluster 
around  one  of  these  values. 

Rating  by  distribution  of  available  cases:  This  approach 
combines  both  judgmental  and  empirical  information. 

For  a variable  with  missing  data,  the  mean  value  is  com- 
puted using  all  cases  on  which  data  are  available.  Addi- 
tional data  points  are  obtained  by  determining  judgmentally 
the  number  of  standard  deviations  the  new  data  are  from 
the  mean.  Rating  is  an  improvement  over  simple  reli- 
ance upon  the  mean  when  the  researcher  has  some  prior 
knowledge  about  a variable's  distribution.^ 


Once  the  missing  data  have  been  derived,  the  levels  of  measurement 
of  the  indicators  and  those  used  as  predictors  must  be  identified  if  the 
analyst  is  to  proceed  with  the  operationalization  phase  of  the  forecast.  7 
If  quantitative  techniques  of  estimation  are  to  be  used,  the  level  of 
measurement  of  the  available  data  determines  the  appropriateness  of 
the  technique. 


This  technique  to  manufacture  data  is  not  appropriate  for  time-series 
data.  For  example,  missing  data  on  GNP  cannot  be  substituted  by 
mean  values  of  the  data  set  since  GNP  usually  increases  with  time. 

A description  of  the  various  levels  of  measurement  that  socio- 
economic and  political  data  can  assume  is  found  in  the  Handbook, 
pp.  11-13. 

Regression  analysis  is  also  sometimes  used  to  generate  missing 

data.  For  an  explanation  of  this  technique,  see  the  Handbook,  pp.  46-52 
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CHAPTER  3:  PARAMETER  ESTIMATION 


By  the  time  the  analyst  reaches  this  step  in  the  forecasting  process, 
he  should  have  developed  a conceptual  model  in  which  every  variable 
has  been  operationalized  and  classified  as  endogenous  or  exogenous. 
Moreover,  the  analyst  should  have  familiarized  himself  with  the  basic 
types  of  relations  and  identified  and  collected  various  data  that  de- 
scribe the  variables.  * If  the  data  and  the  model  "fit"  each  other,  that 
is,  if  there  is  no  discontinuity  in  the  data,  then  the  estimation  pro- 
cedure phase  can  be  initiated.  If  there  are  gaps  in  the  data,  however, 
they  must  be  filled  by  implementing  some  of  the  techniques  described 
earlier. 

There  is  a second  problem  that  may  prevent  the  model  and  the  data 

( ) 

from  fitting  each  other.  This  problem  is  related  to  the  presence  of 

different  types  of  data  categories.  Data  can  be  nominal,  ordinal, 

2 

ratio,  or  interval.  In  certain  instances  it  may  be  impossible  to  em- 
ploy certain  estimation  techniques  if  the  level  of  measurement  is  low 
(nominal  or  ordinal).  Input- Output  Analysis,  for  example,  is  a tech- 
nique that  requires  interval  data.  If  the  data  available  are  only  of  the 
nominal  category,  then  the  aforementioned  technique  cannot  be  em- 
ployed. This  problem  can  impose  certain  constraints  on  the  parameter 
estimation  phase  of  the  forecast  effort. 


An  in-depth  knowledge  of  the  various  types  of  relationships  discussed 
in  Chapter  1 is  not  absolutely  necessary  for  the  forecast  process;  it 
is,  nevertheless,  helpful. 

2 

For  an  explanation  and  discussion  of  the  difference  between  tie  cate- 
gories, see  the  Handbook,  pp.  11-13. 
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among  those  listed  in  the  table. 

TABLE  4 

the  choice  of  estimation  techniques 


Parameter 

Type 


Conceptual 


Structural 


Estimation  T echnique 


Judgmental 

Experts 

Delphi 

Cross -Impact 


Empirical 

Data  Examination 

Regression  Analysis 
Input- Output  Analysis 
Game  Theory  Models 


*•  forecasting  process,  the  data  and 

It  is  crucial  to  note  that  m any  it  ti  techniques  merely 

analyses  are  based  on  the  past.  Quantit  techniques  roay 

make  this  dependence  explicit,  . 

permit  the  dependence  to 


As  was  stated  earlier,  it  is  important  to  distinguish  between  concep- 
tual and  structural  parameters.  A conceptual  parameter  is  a factor 
that  remains  unchanged  throughout  the  period  examined  and  is  not 
included  explicitly  in  the  model.  Structural  parameters,  on  the 
other  hand,  are  parameters  that  are  calculated  via  some  empirical 
technique  and  are  directly  used  in  the  forecast.  Upon  examining  the 
factors,  it  may  become  apparent  that  some  parameters  are  concep- 
tual parameters,  such  as  the  pressure  of  conflict  between  Israel  and 
the  Arab  states  referred  to  in  Chapter  1.  Forecasts  based  on  concep- 
tual parameters  will  produce  inaccuracies.  Because  of  the  possibility 
of  these  biases,  it  is  very  important  to  examine  the  data  carefully 
to  discover  if  conceptual  parameters  are  indeed  present.  In  cases 
where  there  are  no  data,  or  where  the  level  of  measurement  in  the 
data  is  low,  judgmental  expertise  must  be  used  to  determine  whether 
conceptual  parameters  are  present  in  the  system  forecast.  If  con- 
ceptual parameters  are  part  of  the  data,  the  estimates  are  valid  given 
the  pre'sence  of  those  parameters. 

A PROCEDURE  TO  CHOOSE  ESTIMATION  TECHNIQUES 

The  analyst  should  attempt  to  use  estimation  techniques  located  in 
the  lower  right-hand  portion  of  Table  4.  Yet  there  are  many  constraints 
that  may  prevent  him  from  doing  this.  First,  the  functional  forms 
that  relate  the  variables  to  one  another  may  not  be  known,  rendering  any 
empirical  estimation  difficult.  Second,  the  level  of  measurement  of 
• the  data  can  impose  difficulties  on  the  use  of  certain  quantitative  es- 
timation procedures.  Third,  quantitative  techniques  cannot  be  used 
when  there  are  fewer  cases  than  structural  parameters  to  be  esti- 
mated. This  is  a problem  so  serious  as  to  prevent  almost  any  kind  of 
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systematic  estimation.  It  can,  however,  be  solved  by  acquiring  addi- 
tional cases. ^ 


Despite  the  presence  of  these  constraints,  every  effort  should  be  made 
by  the  analyst  to  choose  quantitative  techniques  of  estimation.  This 
choice  is  recommended  not  because  the  authors  have  a blind  commit- 
ment to  quantitative  methods  but  because  quantitative  techniques  of 
estimation  will  give  rise  to  more  precise,  explicit,  accurate,  and 
hence  reliable  forecasts.  Quantitative  methods  are  superior  to  qual- 
itative techniques  for  three  principal  reasons.  First,  they  require 
that  definitions  of  variables  be  exact  and  that  assumptions  be  stated 
explicitly.  Second,  quantitative  estimation  can  consider  complex 
relationships  among  variables  that  cannot  be  evaluated  verbally. 

Third,  quantitative  techniques  reduce  biases  that  are  introduced  into 
the  forecasts  by  value  judgments  and  the  limited  memories  of  analysts 

The  most  popular  technique  for  parameter  estimation  is  Regression 

5 

Analysis.  Nevertheless,  there  are  other  available  techniques  such  a 

Input-Output  Analysis  and  Game  Theory  Models.^*  In  the  absence  of 

appropriate  data,  the  forecaster  may  use  a judgmental  estimation  tech 

7 

nique  such  as  Delphi. 


4 

This  is  a mathematical  problem  that  requires  that  the  number  of 
unknowns  should  be  less  than  or  equal  to  the  number  of  equations. 

5 

For  a discussion  of  Regression  Analysis,  see  the  Handbook,  pp. 
46-52. 

^ Ibid.  , pp.  57-60  and  pp.  73-79  respectively. 

7 

Ibid. , pp.  18-24. 


THE  PROBLEM  OF  FORECASTING  EXOGENOUS  VARIABLES 


Once  the  parameters  of  the  forecasting  model  have  been  estimated,  the 
forecaster  has  a model  that  relates  variables  to  each  other.  These 
relationships  have  been  derived  on  the  basis  of  past  data  on  the  vari- 
ables. The  model  relates  every  endogenous  variable  to  exogenous 
variables  and  parameters.  The  forecasting  problem  involves  pro- 
jecting the  data  beyond  the  confines  of  the  past.  This  requires  fore- 
casting values  of  the  exogenous  variables  that  are  then  combined  with 
the  parameters  to  yield  future  values  of  the  endogenous  variables. 

Ideally,  exogenous  variables  should  be  manipulable.  If  planners  could 
set  the  values  of  the  exogenous  variables  in  the  same  manner  that  the 
Federal  Reserve  System  fixes  the  supply  of  money  in  the  U.S.  eco- 
nomy, then  planners  could  help  bring  about  the  future  environment  they 
consider  most  desirable.  In  long-range  environmental  forecasting, 
however,  this  second  characteristic  is  often  lacking  and  planners  can- 
not influence  future  outcomes.  If  the  exogenous  variables  of  the  fore- 
casting model  are  not  manipulable  variables,  then  they  must  be  pre- 
dicted. 

On  the  surface  it  appears  that  there  are  two  ways  to  forecast  exo- 
genous variables --quantitatively  or  judgmentally.  Realistically,  how- 
ever, only  the  latter  procedure  is  available.  Let  us  illustrate  each 
of  these  procedures  by  an  example.  If,  on  the  basis  of  past  data,  the 
domestic  stability  of  India  is  found  to  be  dependent  on  its  GNP  per 
capita,  then  the  future  domestic  stability  may  be  forecast  by  first 
estimating  future  Indian  GNP  per  capita.  To  generate  this  latter 
forecast,  one  may  use  an  econometric  model.  Nevertheless,  this 
approach  does  not  solve  the  problem  since  the  exogenous  variables 
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of  GNP  per  capita  remain  to  be  forecast.  In  fact,  the  domestic  stability 
and  the  econometric  models  can  be  incorporated  into  one  in  which  the 
exogenous  variables  of  GNP  per  capita  are  used  to  forecast  stability. 

Since  the  use  of  another  model  to  generate  forecast  values  of  predic- 
tor variables  does  not  really  solve  the  problem  but  postpones  it,  the 
variables  have  to  be  forecast  by  using  a qualitative  technique.  The 
General  Handbook  contains  a detailed  description  of  important  tech- 
niques that  may  be  used  in  this  context.  Some  of  the  important  tech- 
niques are  Consensus  Methods,  Scenarios,  Morphological  Analysis, 
and  Cross-Impact  Matrices.  The  forecast  of  the  endogenous  variables 
can  proceed  by  combining  future  values  of  the  exogenous  variables 
with  the  structural  parameters  of  the  model. 

This  general  approach  which  forecasts  exogenous  variables  qualitatively 
implicitly  assumes  that  these  variables  are  easier  to  forecast  qualita- 
tively than  endogenous  variables.  If  this  assumption  is  not  made,  then 
one  could  ask  why  not  treat  all  variables  exogenously  and  forecast  them 
using  some  qualitative  technique?  Alternatively,  why  build  quantitative 
models  altogether?  A totally  qualitative  methodology  appears  less 
cumbersome  than  one  which  first  derives  a model  which  requires  quan- 
titative estimation  of  the  parameters  of  the  model  and  then  qualitative 
estimation  of  future  values  of  its  predictor  variables.  There  are  two 
reasons  for  this  approach  to  long-range  forecasting.  First,  certain 
variables  are  easier  to  predict  than  others,  and  hence  forecasts  would 
be  more  accurate  if  the  endogenous  variables  are  first  expressed  as 
quantitative  functions  of  exogenous  variables  which  may  be  easier  to 
forecast.  Second,  a quantitative  model  can  account  for  many  functional 
relationships  which  are  not  readily  apparent.  It  is  for  these  reasons 
that  this  two-step  approach  to  forecasting  is  suggested. 

Q 
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CHAPTER  4:  A SIMPLE  FORECASTING  MODEL 


Let  us  assume  that  a national  security  analyst  is  interested  in  fore- 
casting the  national  power  base  of  the  People's  Republic  of  China  in 
the  1985-1990  period.  Furthermore,  suppose  the  analyst  lacks  pro- 
fessional training  in  the  areas  of  political  science  and  international 
relations.  The  question  that  arises  is  how  does  he  generate  such  a 
forecast.  This  chapter  will  develop  a step-by-step  procedure  to  fore- 
cast the  national  power  base  of  the  People's  Republic  of  China  over 
the  long  range. 

There  are  four  basic  steps  to  this  procedure: 

• Building  a conceptual  model 

• Collecting  the  data 

• Estimating  the  parameters 

• Generating  the  forecast 

BUILDING  A CONCEPTUAL  MODEL 


To  build  a conceptual  model,  the  analyst  must  compile  a list  of  factors 
that  can  be  used  to  forecast  the  national  power  base  of  the  PRC  in  the 
1985-1990  period.  The  list  should  include  both  variables  and  para- 
meters. To  put  the  list  together,  the  analyst  should  contact  estab- 
lished China  experts  who  are  also  interested  in  the  concept  national 
power  base.  Consensus  Methods  can  be  employed  to  generate  such 
a iist.  A hypothetical  list  of  factors  is  presented  in  Table  5.  This 
list  is  presumed  to  have  been  drawn  up  after  considerable  interaction 
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among  the  experts  cither  through  the  use  of  the  Committee  Approach 
or  the  Delphi  Method. 


TABLE  5 

HYPOTHETICAL  LIST  OF  FACTORS  USED  TO 
FORECAST  THE  NATIONAL  POWER  BASE  OF  THE  PRC 


Parameters 

Varia  bles 

GNP  Growth  Rate 

Net  Population  Growth 
Rate 

Geographical  Area  of 
Country 

Number  of  Ethnic 
Groups  in  Country 

Capital  Accumulation 
Rate 

Economic  Power 

Military  Power 

Total  Population 

Energy  Consumption 

Estimated  National 
Resource  Endowment 

Proportion  of  GNP  Gener- 
ated by  Foreign  Sector 

Gross  Steel  Output 

Length  of  Highway 
Netwo  rk 

The  separation  of  variables  into  endogenous  and  exogenous  variables 
can  be  accomplished  by  using  a modified  Cross-Impact  Matrix.  This 
procedure  involves  three  steps: 

• Make  up  a matrix  of  all  variables. 

, • Measure  effect  variables  on  the  columns  of  the  matrix  and 

cause  variables  on  the  rows  of  the  matrix. 

• Classify  all  variables  with  zeros  in  columns  as  exogenous. 


. 
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The  variables  of  Table  5 are  labeled  Xl  through  Xg  in  Table  6 as 
follows: 


TABLE  b 

LABELING  OF  VARIABLES 


Variable 

Name  of  Variable 

X1 

Economic  power 

X2 

Military  power 

X3 

Total  population 

X4 

Energy  consumption 

X5 

Estimated  national  resource  endowment 

XI 

Proportion  of  GNP  generated  by  foreign  sector 

X7 

Gross  steel  output 

X8 

Length  of  highway  network 

These  eight  variables  will  yield  an  8 by  8 matrix  as  shown  in  Table  7. 
The  analyst  places  checks  in  those  cells  where  he  feels  causality  exists 
between  any  two  variables.  Thus  if  is  believed  to  cause  a change  in 
X , a check  is  place  in  the  appropriate  cell  in  Table  7.  Causality  is 
presumed  to  exist  either  on  the  basis  of  past  empirical  research  or 
on  theoretical  premises.  This  presumes  that  the  analyst  is  acquainted 
with  empirical  studies  and  with  political  science  and  international  re- 
lations theory.  If  the  analyst  does  not  possess  such  expertise,  then 
he  must  rely  on  help  from  the  panel  of  experts.  In  the  example  con- 
sidered, the  following  variables  are  exogenous:  population,  proportion 
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of  GNP  generated  by  the  foreign  sector,  gross  steel  output,  and  length 
of  the  highway  network.  This  implies  that  the  national  power  base  of 
the  PRC  will  be  forecast  from  the  four  aforementioned  exogenous  vari 
ables. 


TABLE  7 

SEPARATION  OF  ENDOGENOUS  FROM  EXOGENOUS  VARIABLES* 


Cause 

Variables 


Effect  Variables 


X X 0 


X5 

X6 

x7 

00 

X 

X 

0 

0 

0 

X 

0 

0 

0 

X 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

X 

0 

0 

0 

1 

0 

0 

0 

0 

Only  direct  effects  are  measured  in  the  matrix. 


The  next  step  in  the  forecast  process  is  to  identify  the  sign  of  the  vari- 
ous relationships  among  the  variables.  To  accomplish  this,  we  ex- 
amine the  matrix  and  place  a positive  or  negative  sign  in  those  cells 
that  have  been  mai'ked  with  an  X.  Decisions  as  to  whether  the  rela- 
tionship between  two  variables  is  positive  or  negative  are  made  either 
on  the  basis  of  theoretical  principles  or  on  the  observation  of  past 
empirical  studies  that  have  identified  such  relationships.  By  recalling 
the  matrix,  the  following  signs  are  inserted  in  each  cell: 


TABLE  8 

IDENTIFICATION  OF  THE  SIGNS  OF  RELATIONS 


Cause 

Variables 


Effect  Variables 


X2 

X3 

x4 

+ 

0 

+ 

0 

0 

+ 

+ 

0 

+ ■ 

0 

0 

0 

+ + 0 


+ + 0 


+ + 0 


x6 

x7 

x8 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

COLLECTING  THE  DATA 


The  next  step  in  the  forecast  process  is  to  collect  the  data  on  the  vari- 
ables. Under  this  phase  of  the  forecast  effort,  the  analyst  must  per- 
form three  tasks: 


• Choose  indicators  on  the  variables. 

• Operationalize  the  variables. 

• Collect  data  on  the  variables  or  indicators. 


Let  us  examine  each  of  these  tasks  individually.  The  national  power 
base  of  the  PRC  is  assumed  to  possess  an  economic  dimension  and  a 
military  dimension.  In  order  to  derive  an  overall  operational  mea- 
sure of  national  power  base,  proper  weights  for  each  dimension  are 
derived.  In  this  manner,  national  power  base  can  be  expressed  as: 


where  a 
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(economic  power)  + a^  (military  power) 
and  a are  judgmentally  determined  weights. 

Cd 


To  obtain  indicators  of  each  component  of  national  power,  the  analyst 
should  once  again  consult  the  experts.  Possible  indicators  of  each  di- 
mension are  listed  in  Table  9.  The  operationalization  proceeds  by 
specifying  precisely  how  the  indicators  are  to  be  measured.  Another 
step  in  this  phase  of  the  forecast  effort  involves  gathering  data  on  the 
indicators.  The  various  sources  of  data  and  problems  encountered  in 
data  collection  have  been  fairly  well  discussed''  and  there  is  no  point  in 
repeating  them  here. 

* See  Chapter  2 of  this  Guide.  To  obtain  data  on  the  aforementioned 
indicators  the  analyst  may  consult  the  Handbook,  Vol  II. 
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TABLE  9 

LIST  OF  INDICATORS  OF  ECONOMIC  AND  MILITARY  POWER 


Economic  Power 

Military  Power 

GNP 

GNP  per  capita 

Percent  of  GNP  generated 
by  industrial  sector 

Manpower  in  armed  forces 
Nuclear  capability 
Defense  budget 
Quantity  of  hardware 

When  data  on  the  indicators  have  been  chosen,  the  analyst  must  dis- 
cover whether  economic  and  military  power  are  homogeneous  or  heter- 
ogeneous variables.  If  they  are  homogeneous,  then  one  indicator  for 
each  variable  would  be  sufficient  to  represent  the  variable.  To  deter- 
mine these  characteristics  of  the  indicators,  the  correlation  coeffi- 
cients for  each  pair  of  variables  must  be  calculated  either  by  using 
a simple  "canned"  computer  program  or  an  electronic  desk  calculator. 
For  the  purposes  of  this  example,  it  will  be  assumed  that  the  values 
of  the  coefficients  are  close  to  one,  and  thus  one  indicator  for  economic 
power  and  one  for  military  power  would  be  sufficient  to  represent 
those  two  variables.  The  indicators  picked  are  GNP  and  size  of  de- 
fense budget,  respectively.  The  national  power  base  (NPB)  of  the 
PRC  is  now  operationalized  as  follows; 

NPB  = (GNP)  + a^  (defense  budget). 

The  national  power  base  model  can  be  expressed  as: 

...  a function  of  total  population  (X^),  energy  consump- 
tion (X  ),.  estimated  national  resource  endowment  (X  ), 

^ 5 
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proportion  of  GNP  generated  by  the  foreign  sector  (X^), 

gross  steel  output  (X  ),  and  length  of  highway  network  (X  ). 

* 8 


Of  the  aforementioned  variables,  X^  and  X5  are  endogenous.  This 


means  that  these  variables  are  determined  by  the  exogenous  variables 
of  the  forecast  model.  Symbolically,  the  forecast  equation  for  national 
power  base  is  written  as: 


NPB  = A + A X + A X + A X + A X + e 
0 13  26  37  48 


where  e represents  unspecified  effects. 


ESTIMATING  THE  PARAMETERS 


The  forecast  effort  at  this  point  involves  estimating  values  of  AQ,  A^, 


A2’  A3j  and  A4  usin§  Past  values  of  the  variables  (NPB,  X , X, , X , 
and  X8^  and  °ne  t^16  tec^na<lues  described  in  the  Guide.  Since 


ratio-level  data  are  available  for  all  the  variables  of  the  model. 


Regression  Analysis  can  be  used  to  estimate  the  values  of  the  para- 
meters. A hypothetical  estimated  equation  is: 


(1)  NPB  = 1280.  21  + 140.  89X,  + 125.  19X,  + 220.  18X 

3 6 7 


+ 9494. 1 2X  + e 
8 


e represents  unspecified  effects.  i.e.  ; errors  that  are  introduced 
into  the  equation  from  three  possible  sources--random  noise  in  the 
data,  omitted  variables,  and  errors  that  occur  when  linearizing 
a nonlinear  relationship. 


See  Chapter  3 of  the  Guide. 
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This  equation  is  in  fact  the  forecasting  model.  . To  obtain  future 

values  of  the  national  power  base  of  the  PRC,  future  values  of  X , X., 

3 6 

X^,  and  Xg  are  substituted  into  the  equation.  Thus,  if  the  analyst 

wants  to  forecast  the  value  of  the  PRC's  national  power  base  in  1990, 

he  multiplies  X^  (the  population  of  the  PRC  in  1990)  by  140.89  and 

adds  it  to  the  percentage  of  GNP  generated  by  the  foreign  sector  in  1990 

multiplied  by  125.  19.  This  in  turn  is  done  for  the  other  two  variables-- 

gross  steel  output  in  1990  and  length  of  highway  network  in  1990. 

Finally,  1280.21  is  added  to  the  total  sum.  This  procedure  first  re- 

5 

quires  estimating  the  values  of  X , X , X , and  X in  1990. 

J o / o 

GENERATING  THE  FORECAST 

To  generate  the  forecast,  Consensus  Methods  are  used  to  forecast 
the  value?  of  the  exogenous  variables  for  the  1985-1990  period. 

Table  10  presents  the  forecast  values  for  j —illation,  percentage  of  GNP 
generated  by  the  foreign  sector,  gross  steel  output,  and  length  of 
highway  network. 

Forecast  values  of  the  exogenous  variables  are  now  plugged  into 
equation  (1)  to  yield  the  national  power  base  of  the  PRC  in  the 
1985-1990  period.  The  forecast  is  presented  in  Table  11. 


4 

We  are  assuming  that  this  equation  has  been  estimated  with  data 
that  cover  the  1950-1970  period. 

5 

This  estimation  was  discussed  under  the  heading  of  "The  Problem 
of  Forecasting  Exogenous  Variables"  in  the  Guide,  pp.  31-32. 
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TABLE  10 

JUDGMENT  ALLY  DETERMINED 
FORECAST  OF  THE  EXOGENOUS  VARIABLES 


Year 

Variables  j 

a 

b 

,„c 

vd 

X3 

X6 

X 

7 

*3 

1985 

912 

8.  5 

151 

158 

1986 

930 

8.  3 

166 

162 

1987 

946 

8.9 

181 

167 

1988 

96? 

9.  1 

196 

169 

1989  . 

963 

9.  - 

215 

171 

1990 

998 

10.  2 

232 

173 

in  millions  of  people 
^ in  percent 

Q 

in  millions  of  tons 
in  thousands  of  miles 


TABLE  11 

FORECAST  VALUES  OF  THE 
NATIONAL  POWER  BASE  OF  THE  P*C  (1985-1990) 


Year 

0* 

National  Power  Base 

1985 

163 

1986 

170 

1987 

175 

1988 

180 

1989 

187 

1990 

194 

in  billions  of  1970  dollars 


ir. VAT. HATING  THE  FORECAST 


There  is  nc  a priori  method  whereby  the  analyst  can  check  the  accu- 
racy of  his  forecast.  He  can,  however,  utilize  a procedure  known  as 
po stdictio  1,  that  is,  evaluating  the  accuracy  of  his  estimates  by  re- 
calculating past  values  of  the  national  power  base  of  the  PRC  using  the 
model  and  comparing  these  values  to  real  values.  There  are  two 
types  of  postdiction  that  can  be  undertaken.  The  first  involves  using 
the  estimated  values  of  the  parameters  and  intercept  to  recreate  the 
national  power  base  of  the  PRC  over  past  periods  (Postdiction  Type  I). 
The  second  way  to  evaluate  the  accuracy  of  the  estimates  involves 
using  a subset  of  the  data  to  calculate  estimates  of  the  parameters 
and  intercept  of  the  national  power  base  equation  and  compare  predicted 
versus  actual  values  for  another  subset  of  the  data.  Let  us  concentrate 
on  each  type  of  postdiction  individually. 

In  the  aforementioned  forecast  example,  it  was  assumed  that  the  pa- 
rameters of  the  national  power  base  equation  were  estimated  by  using 
1950-1970  data  on  national  power  base,  population,  percent  of  GNP 
generated  by  the  foreign  sector,  gross  steel  output,  and  length  of  the 
highway  network.  The  first  way  the  model  can  be  evaluated  is  to  use 
these  parameters  to  reestimate  the  values  of  national  power  base  in 
the  1950-1970  period.  In  this  manner  estimated  national  power  base 
can  be  compared  to  actual  national  power  base.  A hypothetical  com- 
parison is  presented  in  Figure  12. 

A second  way  to  evaluate  the  model  is  to  use  a subset  of  data  (e.  6.  , 
data  for  1950-1960)  to  generate  another  subset  of  data  (e.  g.  , pre- 
dictions for  the  1960-1970  period).  A hypothetical  comparison  of  this 
type  of  test  is  presented  in  Figure  13.  From  the  figure,  it  can  be 
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noticed  that  the  error  between  the  actual  and  predictor  \alue  of  national 
power  base  is  higher  than  in  Postdiction  Type  I.  This  is  to  be  expected 
since  the  former  parameters  are  estimated  from  a large  data  sample-- 
1950-1970  versus  1960-1970.  Thus  the  larger  the  data  sample,  the 
more  precise  will  be  the  estimates. 


rr 
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CONCLUSION 


In  this  Guide,  we  have  developed  a procedure  for  empirical  research 
in  long-range  environmental  forecasting.  The  Research  Guide  will  be 
useful  to  analysts  who  are  familiar  with  the  material  included  in  CACI's 
General  Handbook  referred  to  earlier.  Many  of  the  estimation  procedures 
presented  in  the  Guide  require  the  use  of  computers  since  they  involve 
extensive  arithmetic  manipulations.  Thus  the  analyst  or  his  associates 
should  be  familiar  with  the  basic  operations  of  a computer  or  a ter- 
minal tied  to  a computer.  No  knowledge  of  programming  is  required 
since  most  of  the  programs  that  would  be  required  are  found  in  canned 
form  in  computer  libraries. 

To  obtain  reliable  and  precise  long-range  forecasts  of  the  international 
environment,  three  equally  important  items  are  required- -techniques 
of  forecasting,  theory  and  knowledge,  and  data.  Techniques  of  fore- 
casting are  by  and  large  readily  available.  They  have  been  developed 
by  mathematicians,  statisticians,  and  economists.  For  the  most  part, 
the  analyst  is  faced  only  with  identifying  the  techniques  that  are  appro- 
priate to  his  needs.  To  help  the  analyst  in  this  task,  CACI  has  prepared 
Volume  I of  the  Handbook  which  describes  11  important  forecasting 
techniques.  Many  national  security  analysts,  however,  are  not  fully 
acquainted  with  international  relations  theory.  To  overcome  this  draw- 
back, the  use  of  experts  has  been  suggested.  Finally,  reliable  and  pre- 
cise forecasts  of  the  international  environment  require  data.  To  assist 
defense  analysts  in  this  area,  CACI  has  prepared  Volume  II  of  the 
Handbook  which  describes  300  data  files  containing  over  7,000  variables. 
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1400  Wilson  Blvd.  . Arlington.  VA. 


The  overall  purpose  of  this  Interim  Technical  Report  is  to  provide  the  basis  for 
the  improvement  of  long-range  environmental  forecasting  through  the  use  of 
quantitative  methods.  This  volume  provides  a step  by  step  procedure  that  analysts 
can  use  to  generate  forecasts  of  the  long-range  environment.  The  document  is 
based  on  and  should  be  used  in  conjunction  with  CACI's  A General  Handbook  for 
Long-Range  Environmental  Forecasting,  Interim  Technical  Report  No.  2 (February 
1973).  4s. 
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